A Generalized Model for Multimodal Perception
نویسندگان
چکیده
In order for autonomous robots and humans to effectively collaborate on a task, robots need to be able to perceive their environments in a way that is accurate and consistent with their human teammates. To develop such cohesive perception, robots further need to be able to digest human teammates’ descriptions of an environment to combine those with what they have perceived through computer vision systems. In this context, we develop a graphical model for fusing object recognition results using two different modalities–computer vision and verbal descriptions. In this paper, we specifically focus on three types of verbal descriptions, namely, egocentric positions, relative positions using a landmark, and numeric constraints. We develop a Conditional Random Fields (CRF) based approach to fuse visual and verbal modalities where we model n-ary relations (or descriptions) as factor functions. We hypothesize that human descriptions of an environment will improve robot’s recognition if the information can be properly fused. To verify our hypothesis, we apply our model to the object recognition problem and evaluate our approach on NYU Depth V2 dataset and Visual Genome dataset. We report the results on sets of experiments demonstrating the significant advantage of multimodal perception, and discuss potential real world applications of our approach.
منابع مشابه
A model for distribution centers location-routing problem on a multimodal transportation network with a meta-heuristic solving approach
Nowadays, organizations have to compete with different competitors in regional, national and international levels, so they have to improve their competition capabilities to survive against competitors. Undertaking activities on a global scale requires a proper distribution system which could take advantages of different transportation modes. Accordingly, the present paper addresses a location-r...
متن کاملMultimodal Psychotherapy in Patients with Multiple Sclerosis (MS)
Objective: The main purpose of this study was to investigate the effectiveness of Lazarus Multimodal Psychotherapy (MMT) on perceived stress in individuals with Multiple Sclerosis (MS). Methods: Through a quasi-experimental design, forty patients in Qazvin city in Iran were selected by convenient sampling and then divided into two groups: experimental and control groups. After group assignme...
متن کاملCapacitated Multimodal Structure of a Green Supply Chain Network Considering Multiple Objectives
In this paper, a supply chain network design problem is explained which contains environmental concerns in arcs and nodes of network. It is assumed that there are some routes such as road, rail and etc. in each pair of nodes. In this model decision variables are choosing facilities to open, environmental investment level in each facility and flow of products between nodes in each route. A multi...
متن کاملارزیابی مدل تنظیم هیجانی اختلال اضطراب فراگیر در تبیین ادراک درد
Pain is the most popular stress that human face with. This study aimed to determine the fitness of emotion regulation model of generalized anxiety disorder to explain the perception of pain in patients with chronic pain. This study was conducted in the context of a correlation research the type of structural equation. The sample were consisted of 210 patients referred to a specialized pai...
متن کاملAn Analysis-By-Synthesis Approach to Multisensory Object Shape Perception
The world is multimodal.1 We sense our environments using inputs from multiple sensory modalities. Similarly, digital information is increasingly available through multiple media. In this extended abstract, we present a general computational framework for understanding multimodal learning and perception that builds on the analysis-by-synthesis approach [2, 3]. The analysis-by-synthesis approach...
متن کامل